Locally Linear Embedding (LLE) — Manifold Learning (From Scratch)#
Locally Linear Embedding (LLE) is a nonlinear dimensionality reduction method that tries to “unfold” a manifold by preserving how each point can be reconstructed from its nearest neighbors.
Learning goals#
Build intuition for why LLE works (local geometry → global coordinates)
Derive the local reconstruction weights and the global embedding objective
See how LLE becomes an eigenvalue problem
Implement standard LLE from scratch in NumPy
Visualize neighborhood reconstruction + weight preservation with Plotly
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.datasets import make_swiss_roll
from sklearn.manifold import LocallyLinearEmbedding
from sklearn.preprocessing import StandardScaler
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
Prerequisites#
Linear algebra basics (matrix solve, eigenvectors)
K-nearest neighbors (KNN)
A mental model of a manifold: high-dimensional data that is “locally simple”
1) Intuition#
“Rebuilding each point from its neighbors”
LLE starts with a very local question:
If I look only at the K nearest neighbors of a point, can I express that point as a linear combination of those neighbors?
Think of each neighborhood as a tiny patch of the manifold. Even if the manifold is curved globally, small patches are often close to flat.
“Local recipes that must still work globally”
Once we learn a “recipe” (the neighbor weights) for each point in the original space, LLE searches for low-dimensional coordinates where the same recipes still reconstruct the points. The embedding is the set of coordinates that makes all of those local recipes consistent at once.
# A small 2D intuition picture: one point + its neighbors
n_curve = 240
theta = np.linspace(0, 2 * np.pi, n_curve, endpoint=False)
X_curve = np.c_[np.cos(theta), np.sin(theta)] + 0.03 * rng.normal(size=(n_curve, 2))
i0 = 40
k0 = 10
dist2 = np.sum((X_curve - X_curve[i0]) ** 2, axis=1)
nn0 = np.argsort(dist2)[1 : k0 + 1]
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=X_curve[:, 0],
y=X_curve[:, 1],
mode="markers",
marker=dict(size=6, color="rgba(120,120,120,0.35)"),
name="data",
)
)
fig.add_trace(
go.Scatter(
x=X_curve[nn0, 0],
y=X_curve[nn0, 1],
mode="markers",
marker=dict(size=10, color="#1f77b4"),
name=f"{k0} nearest neighbors",
)
)
fig.add_trace(
go.Scatter(
x=[X_curve[i0, 0]],
y=[X_curve[i0, 1]],
mode="markers",
marker=dict(size=14, color="black", symbol="star"),
name="target point",
)
)
# Light “spokes” to show the neighborhood
for j in nn0:
fig.add_trace(
go.Scatter(
x=[X_curve[i0, 0], X_curve[j, 0]],
y=[X_curve[i0, 1], X_curve[j, 1]],
mode="lines",
line=dict(color="rgba(31,119,180,0.25)", width=2),
showlegend=False,
)
)
fig.update_layout(
title="Intuition: neighborhoods are small patches of the manifold",
xaxis_title="x₁",
yaxis_title="x₂",
width=750,
height=520,
)
fig.show()
2) Mathematical Explanation#
2.1 Local linear reconstruction weights#
Let the dataset be points \(x_1, \dots, x_n\) with \(x_i \in ℝ^D\).
For each point \(x_i\), pick its neighbor index set \(N(i)\) (usually KNN). LLE chooses weights \(w_{ij}\) (only for \(j \in N(i)\)) that minimize the local reconstruction error:
The sum-to-one constraint makes the weights invariant to translations (shifting all points by the same vector).
A common way to compute \(w_i\):
Build the neighbor difference matrix \(Z_i \in ℝ^{K \times D}\) with rows \(x_j - x_i\) for \(j \in N(i)\).
Form the local covariance (Gram) matrix \(C_i = Z_i Z_i^T \in ℝ^{K \times K}\).
Solve \(C_i w_i = 1\) and normalize \(w_i\) so its entries sum to 1.
Add a small regularization term to \(C_i\) for numerical stability (especially when neighbors are nearly collinear).
2.2 Global cost function (preserve the weights)#
After we have weights, we search for low-dimensional coordinates \(y_i \in ℝ^d\) that preserve those same reconstructions:
Here \(W\) is an \(n \times n\) matrix where \(W_{ij}\) is nonzero only if \(j \in N(i)\).
Write the objective in matrix form (with \(Y \in ℝ^{n \times d}\)):
To avoid the trivial solution, we constrain \(Y\) (e.g., center it and fix its scale). Minimizing the trace under those constraints yields an eigenvalue problem:
The embedding \(Y\) is formed from the eigenvectors with the smallest non-zero eigenvalues (skipping the smallest one, whose eigenvector is the constant vector).
3) Step-by-Step Algorithm#
Neighbor selection
For each point \(x_i\), find the indices of its \(K\) nearest neighbors \(N(i)\).
Weight computation (local step)
For each point \(x_i\), solve for weights \(w_i\) that reconstruct \(x_i\) from \(X[N(i)]\).
Enforce \(\sum_j w_{ij} = 1\).
Global embedding
Build the sparse weight matrix \(W\).
Form \(M = (I - W)^T(I - W)\).
Compute the bottom eigenvectors of \(M\) and take the smallest non-trivial ones as the embedding coordinates.
def pairwise_squared_distances(X: np.ndarray) -> np.ndarray:
"""Compute dense pairwise squared Euclidean distances.
Returns an (n, n) matrix with `np.inf` on the diagonal.
"""
X = np.asarray(X, dtype=float)
if X.ndim != 2:
raise ValueError("X must be a 2D array of shape (n_samples, n_features)")
n = X.shape[0]
if n < 2:
raise ValueError("X must contain at least 2 samples")
X_norm = np.sum(X**2, axis=1, keepdims=True)
dist2 = X_norm + X_norm.T - 2.0 * (X @ X.T)
dist2 = np.maximum(dist2, 0.0)
np.fill_diagonal(dist2, np.inf)
return dist2
def knn_indices_from_dist2(dist2: np.ndarray, k: int) -> np.ndarray:
"""Return (n, k) neighbor indices given a pairwise distance matrix."""
dist2 = np.asarray(dist2, dtype=float)
if dist2.ndim != 2 or dist2.shape[0] != dist2.shape[1]:
raise ValueError("dist2 must be a square (n,n) distance matrix")
n = dist2.shape[0]
if not (1 <= k < n):
raise ValueError(f"k must satisfy 1 <= k < n (got k={k}, n={n})")
idx = np.argpartition(dist2, kth=k - 1, axis=1)[:, :k]
rows = np.arange(n)[:, None]
order = np.argsort(dist2[rows, idx], axis=1)
idx = idx[rows, order]
return idx
def lle_local_weights(X: np.ndarray, neighbors: np.ndarray, reg: float = 1e-3) -> np.ndarray:
"""Compute LLE reconstruction weights for each point.
Returns weights with shape (n, k), aligned with `neighbors`.
"""
X = np.asarray(X, dtype=float)
neighbors = np.asarray(neighbors)
if X.ndim != 2:
raise ValueError("X must be a 2D array of shape (n_samples, n_features)")
if neighbors.ndim != 2:
raise ValueError("neighbors must be a 2D integer array of shape (n, k)")
n, d = X.shape
if neighbors.shape[0] != n:
raise ValueError("neighbors must have the same number of rows as X")
k = neighbors.shape[1]
if not (1 <= k < n):
raise ValueError(f"neighbors k must satisfy 1 <= k < n (got k={k}, n={n})")
if np.any(neighbors < 0) or np.any(neighbors >= n):
raise ValueError("neighbors contains out-of-range indices")
weights = np.zeros((n, k), dtype=float)
ones = np.ones(k, dtype=float)
for i in range(n):
Ni = neighbors[i]
Z = X[Ni] - X[i] # (k, d)
C = Z @ Z.T # (k, k)
tr = np.trace(C)
C += (reg * tr if tr > 0 else reg) * np.eye(k)
try:
w = np.linalg.solve(C, ones)
except np.linalg.LinAlgError:
w = np.linalg.lstsq(C, ones, rcond=None)[0]
w_sum = np.sum(w)
if np.isclose(w_sum, 0.0):
w = ones / k
else:
w /= w_sum
weights[i] = w
return weights
class ScratchLLE:
def __init__(self, n_neighbors: int = 12, n_components: int = 2, reg: float = 1e-3):
self.n_neighbors = int(n_neighbors)
self.n_components = int(n_components)
self.reg = float(reg)
def fit(self, X: np.ndarray):
X = np.asarray(X, dtype=float)
if X.ndim != 2:
raise ValueError("X must be a 2D array of shape (n_samples, n_features)")
n = X.shape[0]
if n < 2:
raise ValueError("X must contain at least 2 samples")
if not (1 <= self.n_components < n):
raise ValueError(
f"n_components must satisfy 1 <= n_components < n (got {self.n_components}, n={n})"
)
if not (1 <= self.n_neighbors < n):
raise ValueError(
f"n_neighbors must satisfy 1 <= n_neighbors < n (got {self.n_neighbors}, n={n})"
)
dist2 = pairwise_squared_distances(X)
neighbors = knn_indices_from_dist2(dist2, self.n_neighbors)
weights = lle_local_weights(X, neighbors, reg=self.reg)
W = np.zeros((n, n), dtype=float)
rows = np.arange(n)[:, None]
W[rows, neighbors] = weights
M = np.eye(n) - W
M = M.T @ M
evals, evecs = np.linalg.eigh(M)
Y = evecs[:, 1 : self.n_components + 1]
self.embedding_ = Y
self.neighbors_ = neighbors
self.weights_ = weights
self.eigenvalues_ = evals
return self
def fit_transform(self, X: np.ndarray) -> np.ndarray:
return self.fit(X).embedding_
def lle_from_scratch(
X: np.ndarray,
n_neighbors: int = 12,
n_components: int = 2,
reg: float = 1e-3,
) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
"""Convenience wrapper around `ScratchLLE` (standard LLE)."""
model = ScratchLLE(n_neighbors=n_neighbors, n_components=n_components, reg=reg).fit(X)
return model.embedding_, model.neighbors_, model.weights_, model.eigenvalues_
4) Plotly Visualizations#
We’ll use a classic benchmark: the Swiss roll. It’s a 2D surface embedded in 3D.
4.1 Swiss roll → 2D embedding#
4.2 Local neighborhood reconstruction#
4.3 Weight preservation illustration#
4.4 Compare with sklearn (sanity check)#
n_samples = 900
X_swiss, t = make_swiss_roll(n_samples=n_samples, noise=0.06, random_state=42)
X_swiss = StandardScaler().fit_transform(X_swiss)
n_neighbors = 12
n_components = 2
scratch_lle = ScratchLLE(n_neighbors=n_neighbors, n_components=n_components, reg=1e-3).fit(X_swiss)
Y_scratch = scratch_lle.embedding_
neighbors_swiss = scratch_lle.neighbors_
weights_swiss = scratch_lle.weights_
evals = scratch_lle.eigenvalues_
X_hat = np.sum(weights_swiss[..., None] * X_swiss[neighbors_swiss], axis=1)
recon_err = np.linalg.norm(X_swiss - X_hat, axis=1)
print(
f"Local reconstruction error: mean={recon_err.mean():.4f}, "
f"95th%={np.percentile(recon_err, 95):.4f}"
)
fig = px.histogram(recon_err, nbins=40, title="Local reconstruction errors (||x - x̂||)")
fig.update_layout(xaxis_title="||x - x̂||", yaxis_title="count", width=850, height=420)
fig.show()
fig = px.scatter_3d(
x=X_swiss[:, 0],
y=X_swiss[:, 1],
z=X_swiss[:, 2],
color=t,
title="Swiss roll in 3D (color = intrinsic parameter)",
labels={"x": "x₁", "y": "x₂", "z": "x₃"},
)
fig.update_traces(marker=dict(size=3, opacity=0.8))
fig.show()
fig = px.scatter(
x=Y_scratch[:, 0],
y=Y_scratch[:, 1],
color=t,
title=f"LLE embedding from scratch (K={n_neighbors}, d={n_components})",
labels={"x": "y₁", "y": "y₂"},
)
fig.update_traces(marker=dict(size=5, opacity=0.9))
fig.show()
Local reconstruction error: mean=0.0053, 95th%=0.0124
4.2 Local neighborhood reconstruction#
For one point \(x_i\), we can visualize:
its neighbor set \(N(i)\)
the reconstructed point \(\hat{x}_i = \sum_{j \in N(i)} w_{ij} x_j\)
The reconstruction should be close in the original space if the neighborhood is locally “flat enough”.
def plot_local_reconstruction_3d(
X: np.ndarray,
t: np.ndarray,
neighbors: np.ndarray,
weights: np.ndarray,
i: int,
) -> go.Figure:
Ni = neighbors[i]
wi = weights[i]
x_i = X[i]
x_hat = wi @ X[Ni]
err = float(np.linalg.norm(x_i - x_hat))
fig = go.Figure()
fig.add_trace(
go.Scatter3d(
x=X[:, 0],
y=X[:, 1],
z=X[:, 2],
mode="markers",
marker=dict(size=2, color="rgba(140,140,140,0.25)"),
name="all points",
)
)
fig.add_trace(
go.Scatter3d(
x=X[Ni, 0],
y=X[Ni, 1],
z=X[Ni, 2],
mode="markers",
marker=dict(
size=7,
color=wi,
colorscale="RdBu",
reversescale=True,
colorbar=dict(title="weight"),
),
name="neighbors (colored by weight)",
)
)
fig.add_trace(
go.Scatter3d(
x=[x_i[0]],
y=[x_i[1]],
z=[x_i[2]],
mode="markers",
marker=dict(size=9, color="black", symbol="diamond"),
name=f"x[{i}]",
)
)
fig.add_trace(
go.Scatter3d(
x=[x_hat[0]],
y=[x_hat[1]],
z=[x_hat[2]],
mode="markers",
marker=dict(size=9, color="#d62728", symbol="x"),
name="reconstruction (x̂)",
)
)
fig.add_trace(
go.Scatter3d(
x=[x_i[0], x_hat[0]],
y=[x_i[1], x_hat[1]],
z=[x_i[2], x_hat[2]],
mode="lines",
line=dict(color="#d62728", width=6),
showlegend=False,
)
)
fig.update_layout(
title=f"Local reconstruction in 3D (i={i}, ||x - x̂||={err:.3e})",
scene=dict(xaxis_title="x₁", yaxis_title="x₂", zaxis_title="x₃"),
width=900,
height=620,
)
return fig
i_demo = 250
plot_local_reconstruction_3d(X_swiss, t, neighbors_swiss, weights_swiss, i_demo).show()
4.3 Weight preservation illustration#
In the embedding, LLE tries to make each point satisfy the same reconstruction:
Below, we apply the same weights to neighbors in both the original space and the embedding.
def plot_weight_preservation(
X: np.ndarray,
Y: np.ndarray,
neighbors: np.ndarray,
weights: np.ndarray,
i: int,
) -> go.Figure:
Ni = neighbors[i]
wi = weights[i]
x_i = X[i]
x_hat = wi @ X[Ni]
err_x = float(np.linalg.norm(x_i - x_hat))
y_i = Y[i]
y_hat = wi @ Y[Ni]
err_y = float(np.linalg.norm(y_i - y_hat))
fig = make_subplots(
rows=1,
cols=2,
specs=[[{"type": "scene"}, {"type": "xy"}]],
subplot_titles=(
f"Original space (||x - x̂||={err_x:.3e})",
f"Embedding space (||y - ŷ||={err_y:.3e})",
),
)
# Original space (3D)
fig.add_trace(
go.Scatter3d(
x=X[:, 0],
y=X[:, 1],
z=X[:, 2],
mode="markers",
marker=dict(size=2, color="rgba(140,140,140,0.22)"),
showlegend=False,
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter3d(
x=X[Ni, 0],
y=X[Ni, 1],
z=X[Ni, 2],
mode="markers",
marker=dict(size=7, color=wi, colorscale="RdBu", reversescale=True),
name="neighbors",
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter3d(
x=[x_i[0]],
y=[x_i[1]],
z=[x_i[2]],
mode="markers",
marker=dict(size=9, color="black", symbol="diamond"),
name="x",
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter3d(
x=[x_hat[0]],
y=[x_hat[1]],
z=[x_hat[2]],
mode="markers",
marker=dict(size=9, color="#d62728", symbol="x"),
name="x̂",
),
row=1,
col=1,
)
# Embedding space (2D)
fig.add_trace(
go.Scatter(
x=Y[:, 0],
y=Y[:, 1],
mode="markers",
marker=dict(size=6, color="rgba(140,140,140,0.25)"),
showlegend=False,
),
row=1,
col=2,
)
fig.add_trace(
go.Scatter(
x=Y[Ni, 0],
y=Y[Ni, 1],
mode="markers",
marker=dict(size=11, color=wi, colorscale="RdBu", reversescale=True, showscale=True),
name="neighbors (weights)",
),
row=1,
col=2,
)
fig.add_trace(
go.Scatter(
x=[y_i[0]],
y=[y_i[1]],
mode="markers",
marker=dict(size=14, color="black", symbol="diamond"),
name="y",
),
row=1,
col=2,
)
fig.add_trace(
go.Scatter(
x=[y_hat[0]],
y=[y_hat[1]],
mode="markers",
marker=dict(size=14, color="#d62728", symbol="x"),
name="ŷ",
),
row=1,
col=2,
)
fig.update_layout(
title=f"Weight preservation (same neighborhood weights, i={i})",
width=1100,
height=520,
)
fig.update_scenes(xaxis_title="x₁", yaxis_title="x₂", zaxis_title="x₃", row=1, col=1)
fig.update_xaxes(title_text="y₁", row=1, col=2)
fig.update_yaxes(title_text="y₂", row=1, col=2)
return fig
plot_weight_preservation(X_swiss, Y_scratch, neighbors_swiss, weights_swiss, i_demo).show()
4.4 Compare with sklearn (sanity check)#
Different implementations can produce embeddings that are rotated/flipped/scaled differently, but the geometry should match.
lle = LocallyLinearEmbedding(
n_neighbors=n_neighbors,
n_components=n_components,
method="standard",
random_state=42,
)
Y_sklearn = lle.fit_transform(X_swiss)
fig = px.scatter(
x=Y_sklearn[:, 0],
y=Y_sklearn[:, 1],
color=t,
title="`sklearn` LocallyLinearEmbedding (standard)",
labels={"x": "y₁", "y": "y₂"},
)
fig.update_traces(marker=dict(size=5, opacity=0.9))
fig.show()
5) Strengths & Weaknesses#
Strengths#
Captures nonlinear structure while preserving local geometry.
Works well when data lies on a smooth manifold and neighborhoods are well-sampled.
No gradient descent needed for the standard method (solve local linear systems + an eigenproblem).
Weaknesses#
Manifold assumptions: if the data is not locally linear (or not a manifold), LLE can distort structure.
Noise sensitivity: noisy neighbors → unstable weights → poor embeddings.
Choice of K is critical: too small → disconnected graph; too large → “short-circuits” across the manifold.
Global step needs an eigen-decomposition (can be expensive for large \(n\) without sparse solvers).
Exercises#
Try different values of
n_neighbors(e.g., 6, 12, 24). When does the embedding break?Add more noise to the Swiss roll. How do the local reconstruction errors change?
Replace the brute-force neighbor search with
sklearn.neighbors.NearestNeighborsand compare runtime.
References#
Roweis & Saul (2000), Nonlinear Dimensionality Reduction by Locally Linear Embedding.
sklearn.manifold.LocallyLinearEmbeddingdocumentation.